The aim of this project is to find out if portfolios made of 15 randomly selected stocks from the S&P500 Large-Cap Index can outperform stock-picking experts and index investing. The portfolios were optimized based on the mean-variance framework developed by Harry Markowitz for simplicity, with the goal of maximizing the ex-post Sharpe Ratio during the optimization period.
The project is meant to be a fun/thought experiment, and is not meant to disprove of index or active investing. It does not make use of any statistical methods to prove that random selection, expertise or index investing is a better choice.
library(PortfolioAnalytics) # For portfolio optimization and analysis
library(RColorBrewer) # For color palettes in plots
library(ROI) # For ROI solver in portfolio optimization
library(ROI.plugin.glpk) # Part of the ROI solver for linear optimization
library(ROI.plugin.quadprog) #Part of the ROI solver for quadratic optimization
library(tidyquant) # For quantmod and PerformanceAnalytics functions
library(tidyverse) # For dplyr and ggplot2 functions (data manipulation and plotting)
The components of S&P500 can be obtained using
tidyquant::tq_index()
.
# Only require ticker and maybe name of company, which is columns 1 and 2 of query result
<- tidyquant::tq_index(x = "SP500")[,1:2]
sp500
sp500
<- sp500$symbol tickers
Using quantmod::getSymbols()
, I
retrieved the daily adjusted closing prices of the tickers from Yahoo
Finance, starting from 1 January 2015 to 1 July 2022.
<- as.Date("2015-01-01")
startdate <- as.Date("2022-07-01")
enddate
<- NULL
prices
for (t in tickers) { # Use for loop to get prices of all tickers
<- NULL
tmp
<- try( # Use try function to skip loop and return error message when price cannot be retrieved
tmp expr = {quantmod::getSymbols(Symbols = t, src = "yahoo", auto.assign = FALSE,
from = startdate, to = enddate, periodicity = "daily")[,6]},
silent = TRUE
)
if(inherits(tmp, "try-error")) {cat("ERROR:", t, sep = " "); next}
names(tmp) <- t
<- cbind(prices, tmp)
prices }
## ERROR: BRK.BERROR: BF.B
# Check dimension of object, start and end date of data collected
dim(prices); start(prices); end(prices)
## [1] 1887 502
## [1] "2015-01-02"
## [1] "2022-06-30"
The download has failed for BRK.B (Berkshire Hathaway)
and BF.B (Brown-Forman Corporation)
because the tickers in
Yahoo Finance are BRK-B
and BF-B
respectively.
I used the correct tickers to retrieve their data and added them to
prices
. I also checked which stocks had NA over the
specified period and removed them from prices
.
<- cbind(quantmod::getSymbols(Symbols = "BRK-B", src = "yahoo", auto.assign = F,
prices from = startdate, to = enddate, periodicity = "daily")[,6],
::getSymbols(Symbols = "BF-B", src = "yahoo", auto.assign = F,
quantmodfrom = startdate, to = enddate, periodicity = "daily")[,6],
prices)
colnames(prices)[1:2] <- c("BRK-B", "BF-B")
# Check which stocks have NA over the specified period
data.frame(rbind(which(colSums(is.na(prices)) > 0)), row.names = "Number of NAs")
# Keep only stocks that do not have NAs over the specified period
<- prices[, which(colSums(is.na(prices)) == 0)]
prices
dim(prices) # Left with 482 stocks
## [1] 1887 482
15 tickers were sampled from prices
without replacement
each time, which will be used to create 10 portfolios at the end. The
set.seed()
allows the random sample to be
reproducible.
<- list()
samp_stocks
for (i in 1:10) {
set.seed((50+i)^2)
<- prices[, sample(x = ncol(prices), size = 15, replace = FALSE)]
samp_stocks[[i]]
names(samp_stocks)[i] <- paste("S", i, "_prices", sep = "")
}
# Check stocks sampled into each portfolio
data.frame(sapply(samp_stocks, colnames), row.names = 1:15)
Discrete returns are used as it can be added across assets, unlike
log-returns. Returns can be calculated using
PerformanceAnalytics::Return.calculate()
but since samp_stocks
is a list object, the function needs
to be used together with lapply()
.
<- lapply(samp_stocks, function(x) {
stock_returns na.omit(PerformanceAnalytics::Return.calculate(x, method = "discrete"))
})
names(stock_returns) <- paste("S", c(seq(1:10)), "_returns", sep = "")
data.frame(sapply(stock_returns, dim), row.names = c("Number of rows", "Number of columns"))
# Show first 3 elements in stock_returns, first 5 columns and first 4 rows of data
lapply(stock_returns[1:2], function(x) {
head(x[,1:5], n = 4)
})
## $S1_returns
## WY V AMGN NOC SCHW
## 2015-01-05 0.000000000 -0.022073753 -0.01188338 -0.021097814 -0.03342164
## 2015-01-06 -0.001107650 -0.006443594 -0.03221695 0.005510299 -0.03663123
## 2015-01-07 0.003049679 0.013398154 0.03492451 0.031631465 0.01954536
## 2015-01-08 0.010779294 0.013412803 -0.00360209 0.023197606 0.02614153
##
## $S2_returns
## DG MCD MOS SBAC CINF
## 2015-01-05 -0.006498952 -0.011044559 -0.01791585 -0.015769986 -0.014338451
## 2015-01-06 -0.012656687 0.001843226 0.01334830 -0.005218771 -0.007666568
## 2015-01-07 0.012098692 0.017424265 0.00526882 0.013713724 0.013668776
## 2015-01-08 -0.010815288 0.003723105 0.01179311 0.009987207 0.021301508
Using
PortfolioAnalytics::portfolio.spec()
,
portfolio specification for the mean-variance framework can be created.
Because each portfolio contains different assets, we need to create 10
specifications for optimization.
<- list()
portspecs
for (s in 1:10) {
# Initialize portfolio specification
<- PortfolioAnalytics::portfolio.spec(assets = names(stock_returns[[s]]))
portspecs[[s]]
# Sum of weights constrained to 1, can also specify as type = "full investment"
<- PortfolioAnalytics::add.constraint(portspecs[[s]],
portspecs[[s]] type = "weight_sum",
min_sum= 1, max_sum = 1)
# Weight constraint on each stock, max is 15% of portfolio
<- PortfolioAnalytics::add.constraint(portspecs[[s]],
portspecs[[s]] type="box",
min=0, max=0.15)
# Objective to minimize risk based on variance (function will default to standard deviation as measure of risk)
<- PortfolioAnalytics::add.objective(portspecs[[s]],
portspecs[[s]] type = "risk",
name = "var")
# Adding a return objective, thus we maximize mean return per unit of risk
<- PortfolioAnalytics::add.objective(portspecs[[s]],
portspecs[[s]] type = "return",
name = "mean")
# Add name to specification to distinguish them
names(portspecs)[s] <- paste("P", s, "_spec", sep = "")
}
# Example of what portspecs contains
$P1_spec portspecs
## **************************************************
## PortfolioAnalytics Portfolio Specification
## **************************************************
##
## Call:
## PortfolioAnalytics::portfolio.spec(assets = names(stock_returns[[s]]))
##
## Number of assets: 15
## Asset Names
## [1] "WY" "V" "AMGN" "NOC" "SCHW" "CPRT" "DFS" "MCHP" "BF-B" "CINF"
## More than 10 assets, only printing the first 10
##
## Constraints
## Enabled constraint types
## - weight_sum
## - box
##
## Objectives:
## Enabled objective names
## - var
## - mean
The portfolios are optimized with
PortfolioAnalytics::optimize.portfolio.rebalancing()
,
which re-optimizes the portfolios every specified period. The main
arguments are rebalance_on
,
training_period
and
rolling_window
. We can decide the
frequency of rebalancing (“months”, “quarters”, and “years”), the number
of periods used for optimization (specify an integer matching frequency
of data, in this case daily) and the number of periods to roll the
window used for optimization. I also added the argument
maxSR = TRUE
to maximize the Sharpe Ratio
of the portfolios (the demo can be found here).
The function
PortfolioAnalytics::optimize.portfolio()
only allows for single-period optimization.
<- list()
optports
for (p in 1:10) {
<- PortfolioAnalytics::optimize.portfolio.rebalancing(R = stock_returns[[p]],
optports[[p]] portfolio = portspecs[[p]],
optimize_method = "ROI",
rebalance_on = "quarters",
training_period = 252,
rolling_window = 125,
maxSR = TRUE)
names(optports)[p] <- paste("P", p, "_optimal", sep = "")
}
# Example of what optports contains
$P1_optimal optports
## **************************************************
## PortfolioAnalytics Optimization with Rebalancing
## **************************************************
##
## Call:
## PortfolioAnalytics::optimize.portfolio.rebalancing(R = stock_returns[[p]],
## portfolio = portspecs[[p]], optimize_method = "ROI", maxSR = TRUE,
## rebalance_on = "quarters", training_period = 252, rolling_window = 125)
##
## Number of rebalancing dates: 26
## First rebalance date:
## [1] "2016-03-31"
## Last rebalance date:
## [1] "2022-06-30"
##
## Annualized Portfolio Rebalancing Return:
## [1] 0.164621
##
## Annualized Portfolio Standard Deviation:
## [1] 0.206573
I used 252 periods for training/optimization and rolled the window forward every 125 days, so the portfolios are re-optimized based on previous 1 year (252 trading days) data every half-year (125 trading days) and re-balanced every quarter.
Since the portfolio was re-optimized every half-year, the optimal weights selected using the mean-variance framework with a target of maximizing ex-post Sharpe Ratio should change over time. The weights of the Portfolio 1 are plotted below.
chart.Weights(object = optports[[1]],
colorset = c(RColorBrewer::brewer.pal(n = 7, name = "Dark2"), RColorBrewer::brewer.pal(n = 8, name = "Accent")),
main = "Portfolio 1 Component Weights")
To calculate daily portfolio returns, the daily weights of the
portfolio are required and can be extracted from the optimal portfolios
using
PortfolioAnalytics::extractWeights()
.
<- lapply(optports, FUN = PortfolioAnalytics::extractWeights)
port_weights
names(port_weights) <- paste("P", c(seq(1:10)), "_weights", sep = "")
# Show first 3 elements in stock_returns, first 5 columns and first 4 rows of data each
lapply(port_weights[1:3], function(x) {
head(x[,1:5], n = 4)
})
## $P1_weights
## WY V AMGN NOC SCHW
## 2016-03-31 2.498051e-02 1.356383e-16 3.739019e-17 0.15 -7.108513e-17
## 2016-06-30 -1.280954e-17 5.705632e-17 -2.210002e-17 0.15 1.664727e-16
## 2016-09-30 1.288686e-16 -2.789032e-17 7.170911e-04 0.15 5.654058e-17
## 2016-12-30 -1.669187e-17 1.544942e-17 -1.372749e-17 0.00 1.500000e-01
##
## $P2_weights
## DG MCD MOS SBAC CINF
## 2016-03-31 1.500000e-01 1.500000e-01 -9.226609e-18 -1.203994e-18 1.500000e-01
## 2016-06-30 1.500000e-01 2.711126e-16 -4.044221e-18 -1.985114e-18 1.500000e-01
## 2016-09-30 -2.852867e-18 -5.927237e-18 -2.150322e-17 9.359765e-17 1.500000e-01
## 2016-12-30 1.678395e-17 3.777998e-02 -2.580449e-18 -4.229156e-17 2.581569e-18
##
## $P3_weights
## GS PAYX REGN LRCX CHRW
## 2016-03-31 -4.559622e-17 1.500000e-01 -6.344878e-18 0.1500000 1.500000e-01
## 2016-06-30 9.257308e-17 1.500000e-01 -2.345520e-19 0.0999999 1.500000e-01
## 2016-09-30 1.573552e-16 1.500000e-01 -6.485471e-17 0.1096971 -1.372825e-16
## 2016-12-30 1.500000e-01 -1.080069e-16 8.916761e-18 0.1500000 -9.512732e-17
PerformanceAnalytics::Return.portfolio()
can be used to calculate the daily portfolio returns.
<- NULL
port_returns
for (r in 1:10) {
<- cbind(port_returns,
port_returns ::Return.portfolio(R = stock_returns[[r]],
PerformanceAnalyticsweights = port_weights[[r]],
geometric = TRUE))
names(port_returns)[r] <- paste("P", r, "_returns", sep = "")
}
data.frame(head(port_returns))
Since the aim of this project was to determine if portfolios of random stocks could outperform stock-picking experts and index investing, relevant proxies for benchmarking are required.
The proxies are returns calculated from the Daily Adjusted Closing Prices of popular index or actively-managed Exchange-Traded Funds that have sizeable Total Assets Under Management screened from etfdb.com. The actively-managed ETFs were filtered to have an inception date before 01 January 2015.
The index ETFs chosen are:
The actively-managed ETFs chosen are:
<- c("SPY", "VTI", "QQQ", "IWM", "IJR", "IJH")
index_tickers
<- NULL
index_prices
for (i in index_tickers) {
<- cbind(index_prices,
index_prices ::getSymbols(Symbols = i, src = "yahoo", auto.assign = FALSE,
quantmodfrom = startdate, to = enddate, periodicity = "daily")[,6])
}
colnames(index_prices) <- index_tickers
<- c("ARKK", "ARKG", "EMLP", "ARKW", "ARKQ")
active_tickers
<- NULL
active_prices
for (i in active_tickers) {
<- cbind(active_prices,
active_prices ::getSymbols(Symbols = i, src = "yahoo", auto.assign = FALSE,
quantmodfrom = startdate, to = enddate, periodicity = "daily")[,6])
}
colnames(active_prices) <- active_tickers
The discrete method was also applied to the calculation of benchmark returns for comparison of the cumulative return performance.
<- na.omit(PerformanceAnalytics::Return.calculate(prices = index_prices, method = "discrete"))
index_returns
data.frame(head(index_returns, 4))
<- na.omit(PerformanceAnalytics::Return.calculate(prices = active_prices, method = "discrete"))
active_returns
data.frame(head(active_returns, 4))
::chart.CumReturns(R = cbind(port_returns[,1:5], index_returns),
PerformanceAnalyticsgeometric = TRUE,
main = "Cumulative Returns of Portfolios and Index ETFs",
colorset = RColorBrewer::brewer.pal(n = 11, name = "Spectral"),
plot.engine = "plotly")
::chart.CumReturns(R = cbind(port_returns[,6:10], index_returns),
PerformanceAnalyticsgeometric = TRUE,
main = "Cumulative Returns of Portfolios and Index ETFs",
colorset = RColorBrewer::brewer.pal(n = 11, name = "Spectral"),
plot.engine = "plotly")
QQQ
ETF that tracks the NASDAQ-100
Index, which has a heavy focus on the technology sector, had higher
cumulative returns than the other Index ETFs. The only portfolio that
managed to outperform the QQQ
ETF is Portfolio 2, which is
made up of
::chart.Drawdown(R = cbind(port_returns[,1:5], index_returns),
PerformanceAnalyticsgeometric = TRUE,
main = "Drawdowns of Portfolios and Index ETFs",
colorset = RColorBrewer::brewer.pal(n = 11, name = "Spectral"),
plot.engine = "plotly")
::chart.Drawdown(R = cbind(port_returns[,6:10], index_returns),
PerformanceAnalyticsgeometric = TRUE,
main = "Drawdowns of Portfolios and Index ETFs",
colorset = RColorBrewer::brewer.pal(n = 11, name = "Spectral"),
plot.engine = "plotly")
In general, the IWM
, IJH
and
IJR
ETFs have larger drawdowns, since they track the
small-cap and mid-cap companies in the U.S., which tends to be more
volatile. The portfolios created from 10 randomly selected stocks had
similar drawdowns to the ETFs.
table.AnnualizedReturns(R = cbind(port_returns, index_returns), scale = 252, Rf = 0.025/252, geometric = TRUE, digits = 4)
table.CAPM(Ra = port_returns, Rb = index_returns$SPY, scale = 252, Rf = 0.025/252, digits = 4)
table.DownsideRisk(R = cbind(port_returns, index_returns), scale = 252, Rf = 0.025/252, MAR = 0.07/252, digits = 4)
For the metrics that require risk-free rate (Rf), I chose an annual rate of 2.5%. For the Minimum Acceptable Return (MAR) in calculating Downside Deviation, I used an annual rate of 7%, which is approximately the long-run inflation-adjusted return of the S&P500. Portfolio 2 had the highest annualized return and Sharpe Ratio and the lowest maximum drawdown.
::chart.CumReturns(R = cbind(port_returns[,1:5], active_returns),
PerformanceAnalyticsgeometric = TRUE,
main = "Cumulative Returns of Portfolios and Active ETFs",
colorset = RColorBrewer::brewer.pal(n = 10, name = "Spectral"),
plot.engine = "plotly")
::chart.CumReturns(R = cbind(port_returns[,6:10], active_returns),
PerformanceAnalyticsgeometric = TRUE,
main = "Cumulative Returns of Portfolios and Active ETFs",
colorset = RColorBrewer::brewer.pal(n = 10, name = "Spectral"),
plot.engine = "plotly")
The ARK ETFs (ARKK, ARKG, ARKW, ARKQ) had high cumulative returns during the 2020 to 2021 period, but were the most volatile as well. Portfolio 2, which did the best against Index ETFs, had better cumulative returns in 2022 than the ARK ETFs.
::chart.Drawdown(R = cbind(port_returns[,1:5], active_returns),
PerformanceAnalyticsgeometric = TRUE,
main = "Drawdowns of Portfolios and Active ETFs",
colorset = RColorBrewer::brewer.pal(n = 11, name = "Spectral"),
plot.engine = "plotly")
::chart.Drawdown(R = cbind(port_returns[,6:10], active_returns),
PerformanceAnalyticsgeometric = TRUE,
main = "Drawdowns of Portfolios and Active ETFs",
colorset = RColorBrewer::brewer.pal(n = 11, name = "Spectral"),
plot.engine = "plotly")
The ARK ETFs had the largest drawdowns since 2021. In comparison, the
EMLP
ETF and portfolios made of random stocks had lower
drawdowns.
table.AnnualizedReturns(R = cbind(port_returns, active_returns), scale = 252, Rf = 0.025/252, geometric = TRUE, digits = 4)
table.CAPM(Ra = cbind(port_returns, active_returns), Rb = index_returns$SPY, scale = 252, Rf = 0.025/252, digits = 4)
table.DownsideRisk(R = cbind(port_returns, active_returns), scale = 252, Rf = 0.025/252, MAR = 0.07/252, digits = 4)
The benchmark returns (Rb) in
table.CAPM()
used SPY returns to give a
comparison of the alpha and beta of the portfolios and actively-managed
ETFs. Again, Portfolio 2 had the highest annualized return and Sharpe
Ratio and the lowest maximum drawdown. The ARK ETFs would have
outperformed Portfolio 2 if an investor bought at the start of 2015 and
sold in early 2021 at its peak.
Some caveats about this experiment:
I would like to stress that this project was mainly carried out as a fun/thought experiment. It was not meant to disprove the expertise of active managers, the strategies employed by these funds for stock selection or the power of investing into an index ETF. From the results of this simple experiment, only 1 portfolio out of 10 (Portfolio 2) managed to achieve greater cumulative returns over the years than the rest of the pack. There would be an incredible amount of luck involved to select 15 stocks from the S&P500 universe of stocks that can provide such returns, if they were to be chosen at random.